Load the student scores for the test:
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 3988 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 5501 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
Show data summary
skim(test_scores)
| Name | test_scores |
| Number of rows | 8993 |
| Number of columns | 28 |
| _______________________ | |
| Column type frequency: | |
| character | 4 |
| numeric | 24 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| test_version | 0 | 1 | 3 | 4 | 0 | 2 | 0 |
| year | 0 | 1 | 7 | 7 | 0 | 9 | 0 |
| class | 8993 | 0 | NA | NA | 0 | 0 | 0 |
| AnonID | 0 | 1 | 9 | 9 | 0 | 8899 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| A1_B1 | 39 | 1.00 | 4.38 | 1.48 | 0 | 5.00 | 5.0 | 5.00 | 5 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| A2_B0 | 5500 | 0.39 | 4.23 | 1.73 | 0 | 5.00 | 5.0 | 5.00 | 5 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| A3_B2 | 211 | 0.98 | 3.62 | 2.24 | 0 | 0.00 | 5.0 | 5.00 | 5 | <U+2583><U+2581><U+2581><U+2581><U+2587> |
| A4_B3 | 132 | 0.99 | 4.00 | 1.43 | 0 | 3.00 | 5.0 | 5.00 | 5 | <U+2581><U+2581><U+2582><U+2582><U+2587> |
| A5_B4 | 375 | 0.96 | 4.20 | 1.83 | 0 | 5.00 | 5.0 | 5.00 | 5 | <U+2582><U+2581><U+2581><U+2581><U+2587> |
| A6_B5 | 1377 | 0.85 | 1.99 | 1.89 | 0 | 0.00 | 2.0 | 2.00 | 5 | <U+2587><U+2587><U+2581><U+2581><U+2585> |
| A7_B6 | 716 | 0.92 | 4.00 | 1.93 | 0 | 5.00 | 5.0 | 5.00 | 5 | <U+2582><U+2581><U+2581><U+2581><U+2587> |
| A8_B0 | 5500 | 0.39 | 3.03 | 2.44 | 0 | 0.00 | 5.0 | 5.00 | 5 | <U+2585><U+2581><U+2581><U+2581><U+2587> |
| A9_B8 | 199 | 0.98 | 4.37 | 1.66 | 0 | 5.00 | 5.0 | 5.00 | 5 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| A10_B9 | 244 | 0.97 | 3.62 | 1.90 | 0 | 2.50 | 5.0 | 5.00 | 5 | <U+2582><U+2581><U+2583><U+2581><U+2587> |
| A11_B0 | 5500 | 0.39 | 3.89 | 1.67 | 0 | 2.50 | 5.0 | 5.00 | 5 | <U+2581><U+2581><U+2582><U+2581><U+2587> |
| A12_B12 | 371 | 0.96 | 4.20 | 1.74 | 0 | 5.00 | 5.0 | 5.00 | 5 | <U+2582><U+2581><U+2581><U+2581><U+2587> |
| A13_B13 | 304 | 0.97 | 3.85 | 1.85 | 0 | 2.00 | 5.0 | 5.00 | 5 | <U+2582><U+2582><U+2581><U+2581><U+2587> |
| A14_B14 | 576 | 0.94 | 2.97 | 1.91 | 0 | 2.00 | 2.0 | 5.00 | 5 | <U+2583><U+2586><U+2581><U+2581><U+2587> |
| A15_B15 | 352 | 0.96 | 3.60 | 2.01 | 0 | 2.50 | 5.0 | 5.00 | 5 | <U+2582><U+2581><U+2582><U+2581><U+2587> |
| A16_B16 | 270 | 0.97 | 4.49 | 1.51 | 0 | 5.00 | 5.0 | 5.00 | 5 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| A17_B17 | 200 | 0.98 | 4.62 | 1.33 | 0 | 5.00 | 5.0 | 5.00 | 5 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| A18_B18 | 367 | 0.96 | 3.77 | 2.15 | 0 | 5.00 | 5.0 | 5.00 | 5 | <U+2582><U+2581><U+2581><U+2581><U+2587> |
| A19_B19 | 1044 | 0.88 | 3.17 | 2.41 | 0 | 0.00 | 5.0 | 5.00 | 5 | <U+2585><U+2581><U+2581><U+2581><U+2587> |
| A20_B20 | 920 | 0.90 | 2.37 | 2.50 | 0 | 0.00 | 0.0 | 5.00 | 5 | <U+2587><U+2581><U+2581><U+2581><U+2587> |
| Total | 0 | 1.00 | 69.65 | 21.55 | 0 | 58.50 | 74.0 | 85.50 | 100 | <U+2581><U+2581><U+2583><U+2587><U+2587> |
| A0_B7 | 4149 | 0.54 | 2.26 | 2.49 | 0 | 0.00 | 0.0 | 5.00 | 5 | <U+2587><U+2581><U+2581><U+2581><U+2586> |
| A0_B10 | 4515 | 0.50 | 2.77 | 1.73 | 0 | 1.25 | 2.5 | 4.38 | 5 | <U+2583><U+2587><U+2582><U+2585><U+2587> |
| A0_B11 | 4008 | 0.55 | 4.18 | 1.41 | 0 | 3.00 | 5.0 | 5.00 | 5 | <U+2581><U+2582><U+2581><U+2581><U+2587> |
There are some NAs in the data. These are exclusively in the results from the post version, where Moodle recorded a null response as NA (i.e. cases where students did not give an answer that could be graded).
For this analysis, we replace the NA scores with 0, reflecting a non-correct answer.
test_scores = bind_rows(
"pre" = test_scores_pre %>%
replace_names(old_names = test_versions$pre, new_names = test_versions$label),
"post" = test_scores_post %>%
mutate(across(matches("^B\\d"), ~ replace_na(., 0))) %>%
replace_names(old_names = test_versions$post, new_names = test_versions$label),
.id = "test_version"
) %>%
# we also don't want the Total column, as it skews the shading of the data summary tables below
select(-Total)
Show data summary
skim(test_scores)
| Name | test_scores |
| Number of rows | 8993 |
| Number of columns | 27 |
| _______________________ | |
| Column type frequency: | |
| character | 4 |
| numeric | 23 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| test_version | 0 | 1 | 3 | 4 | 0 | 2 | 0 |
| year | 0 | 1 | 7 | 7 | 0 | 9 | 0 |
| class | 8993 | 0 | NA | NA | 0 | 0 | 0 |
| AnonID | 0 | 1 | 9 | 9 | 0 | 8899 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| A1_B1 | 0 | 1.00 | 4.36 | 1.50 | 0 | 5.00 | 5.00 | 5.00 | 5 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| A2_B0 | 5500 | 0.39 | 4.23 | 1.73 | 0 | 5.00 | 5.00 | 5.00 | 5 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| A3_B2 | 0 | 1.00 | 3.53 | 2.28 | 0 | 0.00 | 5.00 | 5.00 | 5 | <U+2583><U+2581><U+2581><U+2581><U+2587> |
| A4_B3 | 0 | 1.00 | 3.94 | 1.50 | 0 | 3.00 | 5.00 | 5.00 | 5 | <U+2581><U+2581><U+2582><U+2582><U+2587> |
| A5_B4 | 0 | 1.00 | 4.03 | 1.98 | 0 | 5.00 | 5.00 | 5.00 | 5 | <U+2582><U+2581><U+2581><U+2581><U+2587> |
| A6_B5 | 0 | 1.00 | 1.69 | 1.89 | 0 | 0.00 | 2.00 | 2.00 | 5 | <U+2587><U+2586><U+2581><U+2581><U+2583> |
| A7_B6 | 0 | 1.00 | 3.68 | 2.14 | 0 | 2.50 | 5.00 | 5.00 | 5 | <U+2583><U+2581><U+2581><U+2581><U+2587> |
| A8_B0 | 5500 | 0.39 | 3.03 | 2.44 | 0 | 0.00 | 5.00 | 5.00 | 5 | <U+2585><U+2581><U+2581><U+2581><U+2587> |
| A9_B8 | 0 | 1.00 | 4.27 | 1.77 | 0 | 5.00 | 5.00 | 5.00 | 5 | <U+2582><U+2581><U+2581><U+2581><U+2587> |
| A10_B9 | 0 | 1.00 | 3.52 | 1.97 | 0 | 2.50 | 5.00 | 5.00 | 5 | <U+2582><U+2581><U+2583><U+2581><U+2587> |
| A11_B0 | 5500 | 0.39 | 3.89 | 1.67 | 0 | 2.50 | 5.00 | 5.00 | 5 | <U+2581><U+2581><U+2582><U+2581><U+2587> |
| A12_B12 | 0 | 1.00 | 4.02 | 1.90 | 0 | 5.00 | 5.00 | 5.00 | 5 | <U+2582><U+2581><U+2581><U+2581><U+2587> |
| A13_B13 | 0 | 1.00 | 3.72 | 1.95 | 0 | 2.00 | 5.00 | 5.00 | 5 | <U+2582><U+2582><U+2581><U+2581><U+2587> |
| A14_B14 | 0 | 1.00 | 2.78 | 1.99 | 0 | 1.00 | 2.00 | 5.00 | 5 | <U+2585><U+2586><U+2581><U+2581><U+2587> |
| A15_B15 | 0 | 1.00 | 3.46 | 2.09 | 0 | 1.25 | 5.00 | 5.00 | 5 | <U+2583><U+2581><U+2582><U+2581><U+2587> |
| A16_B16 | 0 | 1.00 | 4.36 | 1.67 | 0 | 5.00 | 5.00 | 5.00 | 5 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| A17_B17 | 0 | 1.00 | 4.51 | 1.48 | 0 | 5.00 | 5.00 | 5.00 | 5 | <U+2581><U+2581><U+2581><U+2581><U+2587> |
| A18_B18 | 0 | 1.00 | 3.62 | 2.24 | 0 | 0.00 | 5.00 | 5.00 | 5 | <U+2583><U+2581><U+2581><U+2581><U+2587> |
| A19_B19 | 0 | 1.00 | 2.81 | 2.48 | 0 | 0.00 | 5.00 | 5.00 | 5 | <U+2586><U+2581><U+2581><U+2581><U+2587> |
| A20_B20 | 0 | 1.00 | 2.13 | 2.47 | 0 | 0.00 | 0.00 | 5.00 | 5 | <U+2587><U+2581><U+2581><U+2581><U+2586> |
| A0_B7 | 3493 | 0.61 | 1.99 | 2.45 | 0 | 0.00 | 0.00 | 5.00 | 5 | <U+2587><U+2581><U+2581><U+2581><U+2585> |
| A0_B10 | 3493 | 0.61 | 2.25 | 1.89 | 0 | 0.00 | 1.88 | 4.38 | 5 | <U+2587><U+2586><U+2582><U+2583><U+2587> |
| A0_B11 | 3493 | 0.61 | 3.78 | 1.81 | 0 | 2.00 | 5.00 | 5.00 | 5 | <U+2582><U+2582><U+2581><U+2581><U+2587> |
The number of responses from each cohort:
test_scores %>%
group_by(year) %>%
tally() %>%
gt() %>%
data_color(
columns = c("n"),
colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
)
| year | n |
|---|---|
| 2013-14 | 880 |
| 2014-15 | 1011 |
| 2015-16 | 777 |
| 2016-17 | 825 |
| 2017-18 | 926 |
| 2018-19 | 1159 |
| 2019-20 | 1237 |
| 2020-21 | 1182 |
| 2021-22 | 996 |
There is the same (roughly normal) distribution of raw scores each year:
total_scores <- test_scores %>% mutate(total = rowSums(across(where(is.numeric)), na.rm = TRUE))
total_scores %>%
ggplot(aes(x = total)) +
geom_histogram(binwidth = 2) +
facet_wrap(vars(year)) +
theme_minimal()
Mean and standard deviation for each item (note this table is very wide - see the scroll bar at the bottom!):
test_scores %>%
select(-class, -test_version) %>%
group_by(year) %>%
skim_without_charts() %>%
select(-contains("character."), -contains("numeric.p"), -skim_type) %>%
rename(complete = complete_rate) %>%
# make the table wider, i.e. with separate columns for each year's results, with the year at the start of each column name
pivot_wider(names_from = year, values_from = -c(skim_variable, year), names_glue = "{year}__{.value}") %>%
# put the columns in order by year
select(sort(names(.))) %>%
select(skim_variable, everything()) %>%
# use GT to make the table look nice
gt(rowname_col = "skim_variable") %>%
# group the columns from each year
tab_spanner_delim(delim = "__") %>%
fmt_number(columns = contains("numeric"), decimals = 2) %>%
fmt_percent(columns = contains("complete"), decimals = 0) %>%
# change all the numeric.mean and numeric.sd column names to Mean and SD
cols_label(
.list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.mean"), label = "Mean") %>% deframe()
) %>%
cols_label(
.list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.sd"), label = "SD") %>% deframe()
) %>%
data_color(
columns = contains("numeric.mean"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
)
| 2013-14 | 2014-15 | 2015-16 | 2016-17 | 2017-18 | 2018-19 | 2019-20 | 2020-21 | 2021-22 | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | complete | n_missing | Mean | SD | |
| AnonID | 100% | 0 | NA | NA | 100% | 0 | NA | NA | 100% | 0 | NA | NA | 100% | 0 | NA | NA | 100% | 0 | NA | NA | 100% | 0 | NA | NA | 100% | 0 | NA | NA | 100% | 0 | NA | NA | 100% | 0 | NA | NA |
| A1_B1 | 100% | 0 | 4.20 | 1.55 | 100% | 0 | 4.23 | 1.53 | 100% | 0 | 4.30 | 1.50 | 100% | 0 | 4.30 | 1.47 | 100% | 0 | 4.43 | 1.50 | 100% | 0 | 4.37 | 1.54 | 100% | 0 | 4.49 | 1.42 | 100% | 0 | 4.48 | 1.42 | 100% | 0 | 4.36 | 1.59 |
| A2_B0 | 100% | 0 | 4.18 | 1.78 | 100% | 0 | 4.15 | 1.79 | 100% | 0 | 4.33 | 1.65 | 100% | 0 | 4.28 | 1.69 | 0% | 926 | NaN | NA | 0% | 1159 | NaN | NA | 0% | 1237 | NaN | NA | 0% | 1182 | NaN | NA | 0% | 996 | NaN | NA |
| A3_B2 | 100% | 0 | 3.20 | 2.40 | 100% | 0 | 3.03 | 2.44 | 100% | 0 | 3.22 | 2.39 | 100% | 0 | 3.38 | 2.34 | 100% | 0 | 3.57 | 2.26 | 100% | 0 | 3.64 | 2.23 | 100% | 0 | 3.85 | 2.10 | 100% | 0 | 3.79 | 2.14 | 100% | 0 | 3.83 | 2.12 |
| A4_B3 | 100% | 0 | 3.55 | 1.56 | 100% | 0 | 3.59 | 1.57 | 100% | 0 | 3.72 | 1.56 | 100% | 0 | 3.87 | 1.52 | 100% | 0 | 4.03 | 1.47 | 100% | 0 | 4.06 | 1.43 | 100% | 0 | 4.07 | 1.47 | 100% | 0 | 4.17 | 1.41 | 100% | 0 | 4.21 | 1.40 |
| A5_B4 | 100% | 0 | 3.85 | 2.10 | 100% | 0 | 3.92 | 2.06 | 100% | 0 | 4.05 | 1.96 | 100% | 0 | 4.07 | 1.95 | 100% | 0 | 3.93 | 2.05 | 100% | 0 | 4.03 | 1.98 | 100% | 0 | 4.07 | 1.95 | 100% | 0 | 4.20 | 1.83 | 100% | 0 | 4.05 | 1.96 |
| A6_B5 | 100% | 0 | 1.37 | 1.82 | 100% | 0 | 1.40 | 1.81 | 100% | 0 | 1.70 | 1.95 | 100% | 0 | 1.88 | 1.97 | 100% | 0 | 1.73 | 1.82 | 100% | 0 | 1.65 | 1.83 | 100% | 0 | 1.75 | 1.86 | 100% | 0 | 1.82 | 1.90 | 100% | 0 | 1.85 | 1.96 |
| A7_B6 | 100% | 0 | 3.36 | 2.27 | 100% | 0 | 3.22 | 2.32 | 100% | 0 | 3.46 | 2.24 | 100% | 0 | 3.84 | 2.07 | 100% | 0 | 3.77 | 2.09 | 100% | 0 | 3.80 | 2.06 | 100% | 0 | 3.90 | 2.02 | 100% | 0 | 3.90 | 2.01 | 100% | 0 | 3.69 | 2.14 |
| A8_B0 | 100% | 0 | 2.80 | 2.48 | 100% | 0 | 2.82 | 2.48 | 100% | 0 | 3.13 | 2.42 | 100% | 0 | 3.43 | 2.32 | 0% | 926 | NaN | NA | 0% | 1159 | NaN | NA | 0% | 1237 | NaN | NA | 0% | 1182 | NaN | NA | 0% | 996 | NaN | NA |
| A9_B8 | 100% | 0 | 4.05 | 1.96 | 100% | 0 | 4.13 | 1.89 | 100% | 0 | 4.16 | 1.87 | 100% | 0 | 4.32 | 1.72 | 100% | 0 | 4.20 | 1.83 | 100% | 0 | 4.37 | 1.66 | 100% | 0 | 4.37 | 1.66 | 100% | 0 | 4.40 | 1.63 | 100% | 0 | 4.32 | 1.71 |
| A10_B9 | 100% | 0 | 3.01 | 2.10 | 100% | 0 | 3.06 | 2.09 | 100% | 0 | 3.14 | 2.05 | 100% | 0 | 3.35 | 2.04 | 100% | 0 | 3.62 | 1.87 | 100% | 0 | 3.63 | 1.90 | 100% | 0 | 3.73 | 1.89 | 100% | 0 | 3.90 | 1.79 | 100% | 0 | 3.90 | 1.79 |
| A11_B0 | 100% | 0 | 3.84 | 1.66 | 100% | 0 | 3.90 | 1.67 | 100% | 0 | 3.89 | 1.67 | 100% | 0 | 3.94 | 1.70 | 0% | 926 | NaN | NA | 0% | 1159 | NaN | NA | 0% | 1237 | NaN | NA | 0% | 1182 | NaN | NA | 0% | 996 | NaN | NA |
| A12_B12 | 100% | 0 | 3.73 | 2.08 | 100% | 0 | 3.62 | 2.14 | 100% | 0 | 3.83 | 2.02 | 100% | 0 | 3.96 | 1.95 | 100% | 0 | 4.12 | 1.82 | 100% | 0 | 4.14 | 1.80 | 100% | 0 | 4.21 | 1.76 | 100% | 0 | 4.23 | 1.73 | 100% | 0 | 4.19 | 1.77 |
| A13_B13 | 100% | 0 | 3.46 | 2.09 | 100% | 0 | 3.39 | 2.10 | 100% | 0 | 3.56 | 2.00 | 100% | 0 | 3.75 | 1.96 | 100% | 0 | 3.68 | 1.94 | 100% | 0 | 3.81 | 1.89 | 100% | 0 | 3.87 | 1.87 | 100% | 0 | 3.94 | 1.81 | 100% | 0 | 3.83 | 1.90 |
| A14_B14 | 100% | 0 | 2.27 | 1.96 | 100% | 0 | 2.33 | 1.97 | 100% | 0 | 2.65 | 1.98 | 100% | 0 | 2.80 | 2.00 | 100% | 0 | 2.82 | 1.96 | 100% | 0 | 2.87 | 1.93 | 100% | 0 | 3.01 | 1.96 | 100% | 0 | 3.01 | 1.96 | 100% | 0 | 3.04 | 2.01 |
| A15_B15 | 100% | 0 | 3.34 | 2.13 | 100% | 0 | 3.24 | 2.15 | 100% | 0 | 3.43 | 2.09 | 100% | 0 | 3.40 | 2.17 | 100% | 0 | 3.50 | 2.07 | 100% | 0 | 3.49 | 2.06 | 100% | 0 | 3.42 | 2.12 | 100% | 0 | 3.71 | 1.98 | 100% | 0 | 3.56 | 2.09 |
| A16_B16 | 100% | 0 | 4.24 | 1.79 | 100% | 0 | 4.13 | 1.89 | 100% | 0 | 4.25 | 1.79 | 100% | 0 | 4.31 | 1.73 | 100% | 0 | 4.42 | 1.61 | 100% | 0 | 4.40 | 1.63 | 100% | 0 | 4.38 | 1.65 | 100% | 0 | 4.45 | 1.57 | 100% | 0 | 4.58 | 1.39 |
| A17_B17 | 100% | 0 | 4.39 | 1.63 | 100% | 0 | 4.26 | 1.78 | 100% | 0 | 4.42 | 1.60 | 100% | 0 | 4.39 | 1.64 | 100% | 0 | 4.62 | 1.33 | 100% | 0 | 4.58 | 1.39 | 100% | 0 | 4.60 | 1.36 | 100% | 0 | 4.62 | 1.33 | 100% | 0 | 4.64 | 1.30 |
| A18_B18 | 100% | 0 | 3.23 | 2.39 | 100% | 0 | 3.17 | 2.41 | 100% | 0 | 3.41 | 2.33 | 100% | 0 | 3.54 | 2.28 | 100% | 0 | 3.70 | 2.20 | 100% | 0 | 3.65 | 2.22 | 100% | 0 | 3.84 | 2.11 | 100% | 0 | 3.87 | 2.09 | 100% | 0 | 3.95 | 2.04 |
| A19_B19 | 100% | 0 | 2.14 | 2.47 | 100% | 0 | 2.08 | 2.47 | 100% | 0 | 2.43 | 2.50 | 100% | 0 | 2.76 | 2.49 | 100% | 0 | 2.86 | 2.48 | 100% | 0 | 3.05 | 2.44 | 100% | 0 | 3.12 | 2.42 | 100% | 0 | 3.22 | 2.39 | 100% | 0 | 3.24 | 2.39 |
| A20_B20 | 100% | 0 | 1.70 | 2.37 | 100% | 0 | 1.67 | 2.36 | 100% | 0 | 1.86 | 2.42 | 100% | 0 | 2.27 | 2.49 | 100% | 0 | 2.20 | 2.48 | 100% | 0 | 2.29 | 2.49 | 100% | 0 | 2.34 | 2.50 | 100% | 0 | 2.23 | 2.49 | 100% | 0 | 2.43 | 2.50 |
| A0_B7 | 0% | 880 | NaN | NA | 0% | 1011 | NaN | NA | 0% | 777 | NaN | NA | 0% | 825 | NaN | NA | 100% | 0 | 1.88 | 2.42 | 100% | 0 | 1.90 | 2.43 | 100% | 0 | 1.98 | 2.45 | 100% | 0 | 2.14 | 2.47 | 100% | 0 | 2.03 | 2.46 |
| A0_B10 | 0% | 880 | NaN | NA | 0% | 1011 | NaN | NA | 0% | 777 | NaN | NA | 0% | 825 | NaN | NA | 100% | 0 | 1.99 | 1.84 | 100% | 0 | 2.06 | 1.84 | 100% | 0 | 2.33 | 1.90 | 100% | 0 | 2.47 | 1.92 | 100% | 0 | 2.37 | 1.93 |
| A0_B11 | 0% | 880 | NaN | NA | 0% | 1011 | NaN | NA | 0% | 777 | NaN | NA | 0% | 825 | NaN | NA | 100% | 0 | 3.72 | 1.85 | 100% | 0 | 3.76 | 1.80 | 100% | 0 | 3.84 | 1.78 | 100% | 0 | 3.82 | 1.80 | 100% | 0 | 3.76 | 1.85 |
Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.
Since the combined data on the two versions of the test have large portions of “missing” data (e.g. no responses to new questions from students who completed the old test), it is not possible to carry out the factor analysis as in the analyse-test script, since the factor analysis routine does not work with missing data.
Instead, in the next section we proceed directly to fitting the IRT model, and using the \(Q_3\) check for local independence. In the final section, we also run a factor analysis for the data from the new test only.
The mirt implementation of the graded partial credit model (gpmc) requires that the partial marks are consecutive integers. We therefore need to work around this by adjusting our scores into that form (e.g. replacing scores of 0, 2.5, 5 with 1, 2, 3), while keeping track of the true scores attached to each mark level so that we can properly compute expected scores later on.
# save just the matrix of scores
item_scores <- test_scores %>%
select(matches("^A\\d"))
# Determine the mark levels for each item
mark_levels <- item_scores %>%
pivot_longer(everything(), names_to = "item", values_to = "score") %>%
distinct() %>%
# we don't want to give a level to NA
filter(!is.na(score)) %>%
arrange(parse_number(item), score) %>%
group_by(item) %>%
mutate(order = row_number()) %>%
# Note that the convention used by mirt is for items that have only 2 levels (i.e. 0 marks or full marks),
# the columns are P.0 and P.1, while other items are indexed from 1, i.e. P.1, P.2, ...
# https://github.com/philchalmers/mirt/blob/accd2383b9a4d17a4cab269717ce98434900b62c/R/probtrace.R#L57
mutate(level = case_when(
max(order) == 2 ~ order - 1,
TRUE ~ order * 1.0
)) %>%
mutate(levelname = paste0(item, ".P.", level))
# Use the mark_levels table to replace scores with levels
# (first pivot the data to long form, make the replacement, then pivot back to wide again)
item_scores_levelled <- item_scores %>%
# temporarily add row identifiers
mutate(row = row_number()) %>%
pivot_longer(cols = -row, names_to = "item", values_to = "score") %>%
left_join(mark_levels %>% select(item, score, level), by = c("item", "score")) %>%
select(-score) %>%
pivot_wider(names_from = "item", values_from = "level") %>%
select(-row)
Show model fitting output
irt_fit <- mirt(
data = item_scores_levelled, # just the columns with question scores
model = 1, # number of factors to extract
itemtype = "gpcm", # generalised partial credit model
SE = TRUE # estimate standard errors
)
##
Iteration: 1, Log-Lik: -145717.510, Max-Change: 2.56485
Iteration: 2, Log-Lik: -130984.841, Max-Change: 1.05305
Iteration: 3, Log-Lik: -129044.431, Max-Change: 0.26154
Iteration: 4, Log-Lik: -128798.276, Max-Change: 0.15657
Iteration: 5, Log-Lik: -128734.230, Max-Change: 0.15905
Iteration: 6, Log-Lik: -128705.150, Max-Change: 0.14087
Iteration: 7, Log-Lik: -128683.347, Max-Change: 0.08365
Iteration: 8, Log-Lik: -128671.511, Max-Change: 0.05435
Iteration: 9, Log-Lik: -128662.797, Max-Change: 0.01875
Iteration: 10, Log-Lik: -128658.284, Max-Change: 0.02391
Iteration: 11, Log-Lik: -128655.946, Max-Change: 0.02953
Iteration: 12, Log-Lik: -128654.673, Max-Change: 0.02986
Iteration: 13, Log-Lik: -128653.305, Max-Change: 0.00863
Iteration: 14, Log-Lik: -128652.870, Max-Change: 0.00219
Iteration: 15, Log-Lik: -128652.770, Max-Change: 0.00167
Iteration: 16, Log-Lik: -128652.637, Max-Change: 0.00120
Iteration: 17, Log-Lik: -128652.615, Max-Change: 0.00012
Iteration: 18, Log-Lik: -128652.613, Max-Change: 0.00025
Iteration: 19, Log-Lik: -128652.612, Max-Change: 0.00016
Iteration: 20, Log-Lik: -128652.610, Max-Change: 0.00031
Iteration: 21, Log-Lik: -128652.610, Max-Change: 0.00013
Iteration: 22, Log-Lik: -128652.609, Max-Change: 0.00011
Iteration: 23, Log-Lik: -128652.608, Max-Change: 0.00021
Iteration: 24, Log-Lik: -128652.607, Max-Change: 0.00008
##
## Calculating information matrix...
We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).
irt_residuals %>% as.matrix() %>%
corrplot::corrplot(type = "upper")
This shows that most item pairs are independent, with only one showing cause for concern:
irt_residuals %>%
rownames_to_column(var = "item1") %>%
as_tibble() %>%
pivot_longer(cols = starts_with("A"), names_to = "item2", values_to = "Q3_score") %>%
filter(abs(Q3_score) > 0.2) %>%
filter(parse_number(item1) < parse_number(item2)) %>%
gt()
| item1 | item2 | Q3_score |
|---|---|---|
| A16_B16 | A17_B17 | 0.25521 |
This pair of questions on variations of the chain rule was also identified by the analysis of the first version of the test.
Given that this violation of the local independence assumption is very mild, we proceed using this model.
We then compute factor score estimates and augment the existing data frame with these estimates, to keep everything in one place. To do the estimation, we use the fscores() function from the mirt package which takes in a computed model object and computes factor score estimates according to the method specified. We will use the EAP method for factor score estimation, which is the “expected a-posteriori” method, the default.
test_scores <- test_scores %>%
mutate(F1 = fscores(irt_fit, method = "EAP"))
We can also calculate the model coefficient estimates using a generic function coef() which is used to extract model coefficients from objects returned by modeling functions. We will set the IRTpars argument to TRUE, which means slope intercept parameters will be converted into traditional IRT parameters.
irt_coefs <- coef(irt_fit, IRTpars = TRUE)
The resulting object coefs is a list, with one element for each question, and an additional GroupPars element that we won’t be using. For each question, the object records several values:
a is discriminationb is difficultyTo make this output a little more user friendly, we use the tidy_mirt_coefs function that we have provided in common-functions.R, to produce a single data frame with a row for each question.
tidy_mirt_coefs <- function(x){
x %>%
# melt the list element
melt() %>%
# convert to a tibble
as_tibble() %>%
# convert factors to characters
mutate(across(where(is.factor), as.character)) %>%
# only focus on rows where X2 is a, or starts with b (the parameters in the GPCM)
filter(X2 == "a" | str_detect(X2, "^b")) %>%
# in X1, relabel par (parameter) as est (estimate)
mutate(X1 = if_else(X1 == "par", "est", X1)) %>%
# turn into a wider data frame
pivot_wider(names_from = X1, values_from = value) %>%
rename(par = X2)
}
# use head(., -1) to remove the last element, `GroupPars`, which does not correspond to a question
tidy_irt_coefs <- map_dfr(head(irt_coefs, -1), tidy_mirt_coefs, .id = "Question")
Here is a nicely formatted table of the result:
tidy_irt_coefs %>%
filter(par == "a") %>%
select(-par) %>%
rename_with(.fn = ~ paste0("a_", .x), .cols = -Question) %>%
left_join(
tidy_irt_coefs %>%
filter(str_detect(par, "^b")),
by = "Question"
) %>%
gt(groupname_col = "Question") %>%
fmt_number(columns = contains("est|_"), decimals = 3) %>%
data_color(
columns = contains("a_"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = c("est", "CI_2.5", "CI_97.5"),
colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
) %>%
tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
tab_spanner(label = "Difficulty", columns = c("par", "est", "CI_2.5", "CI_97.5"))
| Discrimination | Difficulty | |||||
|---|---|---|---|---|---|---|
| a_est | a_CI_2.5 | a_CI_97.5 | par | est | CI_2.5 | CI_97.5 |
| A1_B1 | ||||||
| 0.6498042 | 0.6039469 | 0.6956614 | b1 | -0.71654270 | -0.8802811 | -0.55280428 |
| 0.6498042 | 0.6039469 | 0.6956614 | b2 | -3.61893119 | -3.8882574 | -3.34960494 |
| A2_B0 | ||||||
| 0.6301155 | 0.5663522 | 0.6938787 | b1 | 1.01263690 | 0.6296772 | 1.39559661 |
| 0.6301155 | 0.5663522 | 0.6938787 | b2 | -5.03660052 | -5.5871736 | -4.48602742 |
| A3_B2 | ||||||
| 1.3962706 | 1.3137024 | 1.4788388 | b1 | -0.81419853 | -0.8677811 | -0.76061599 |
| A4_B3 | ||||||
| 0.2564078 | 0.2424465 | 0.2703690 | b1 | 17.55677290 | 13.3994024 | 21.71414340 |
| 0.2564078 | 0.2424465 | 0.2703690 | b2 | -18.53411144 | -22.6679577 | -14.40026514 |
| 0.2564078 | 0.2424465 | 0.2703690 | b3 | 2.30865874 | 1.4760844 | 3.14123307 |
| 0.2564078 | 0.2424465 | 0.2703690 | b4 | 3.71818628 | 2.2937500 | 5.14262258 |
| 0.2564078 | 0.2424465 | 0.2703690 | b5 | -10.43564199 | -11.8119677 | -9.05931629 |
| 0.2564078 | 0.2424465 | 0.2703690 | b6 | 0.64973553 | 0.1011823 | 1.19828876 |
| 0.2564078 | 0.2424465 | 0.2703690 | b7 | -3.31025417 | -3.8333985 | -2.78710984 |
| 0.2564078 | 0.2424465 | 0.2703690 | b8 | 4.43086127 | 3.7811680 | 5.08055453 |
| 0.2564078 | 0.2424465 | 0.2703690 | b9 | -1.93539486 | -2.6062557 | -1.26453399 |
| 0.2564078 | 0.2424465 | 0.2703690 | b10 | -3.05857448 | -3.5832469 | -2.53390204 |
| 0.2564078 | 0.2424465 | 0.2703690 | b11 | 7.96868276 | 7.0562188 | 8.88114668 |
| 0.2564078 | 0.2424465 | 0.2703690 | b12 | -14.92629639 | -16.0585829 | -13.79400986 |
| A5_B4 | ||||||
| 1.3628424 | 1.2775306 | 1.4481542 | b1 | -1.34560886 | -1.4195075 | -1.27171021 |
| A6_B5 | ||||||
| 0.4897225 | 0.4582114 | 0.5212336 | b1 | 0.51215829 | 0.4016310 | 0.62268555 |
| 0.4897225 | 0.4582114 | 0.5212336 | b2 | 8.50824198 | 7.7403844 | 9.27609958 |
| 0.4897225 | 0.4582114 | 0.5212336 | b3 | -6.67190462 | -7.4165123 | -5.92729694 |
| A7_B6 | ||||||
| 0.8893270 | 0.8383670 | 0.9402870 | b1 | 1.33395065 | 1.1624063 | 1.50549501 |
| 0.8893270 | 0.8383670 | 0.9402870 | b2 | -3.06069337 | -3.2718360 | -2.84955076 |
| A8_B0 | ||||||
| 0.8708402 | 0.7763904 | 0.9652901 | b1 | -0.73764157 | -0.8383570 | -0.63692616 |
| A9_B8 | ||||||
| 1.8753874 | 1.7586786 | 1.9920962 | b1 | -1.39576942 | -1.4602993 | -1.33123955 |
| A10_B9 | ||||||
| 0.8977396 | 0.8472517 | 0.9482275 | b1 | -0.69447234 | -0.7746887 | -0.61425600 |
| 0.8977396 | 0.8472517 | 0.9482275 | b2 | -1.05314133 | -1.1438435 | -0.96243917 |
| A11_B0 | ||||||
| 0.5284065 | 0.4825679 | 0.5742452 | b1 | -0.05164365 | -0.4559668 | 0.35267952 |
| 0.5284065 | 0.4825679 | 0.5742452 | b2 | -3.64927273 | -4.0551738 | -3.24337165 |
| 0.5284065 | 0.4825679 | 0.5742452 | b3 | 5.01032412 | 4.2468686 | 5.77377959 |
| 0.5284065 | 0.4825679 | 0.5742452 | b4 | -7.63488521 | -8.5220153 | -6.74775508 |
| A12_B12 | ||||||
| 0.6690175 | 0.6297765 | 0.7082585 | b1 | 1.33006553 | 1.0966172 | 1.56351386 |
| 0.6690175 | 0.6297765 | 0.7082585 | b2 | -0.54793330 | -0.7719983 | -0.32386835 |
| 0.6690175 | 0.6297765 | 0.7082585 | b3 | -4.31202385 | -4.6122088 | -4.01183889 |
| A13_B13 | ||||||
| 0.6687580 | 0.6313361 | 0.7061800 | b1 | -0.53674184 | -0.6571816 | -0.41630213 |
| 0.6687580 | 0.6313361 | 0.7061800 | b2 | 2.96793195 | 2.6506158 | 3.28524810 |
| 0.6687580 | 0.6313361 | 0.7061800 | b3 | -5.41392753 | -5.8108857 | -5.01696940 |
| A14_B14 | ||||||
| 0.6530867 | 0.6192677 | 0.6869057 | b1 | 1.42504291 | 1.2260140 | 1.62407186 |
| 0.6530867 | 0.6192677 | 0.6869057 | b2 | -3.15568713 | -3.3663622 | -2.94501210 |
| 0.6530867 | 0.6192677 | 0.6869057 | b3 | 7.34229727 | 6.6044972 | 8.08009735 |
| 0.6530867 | 0.6192677 | 0.6869057 | b4 | -3.32992312 | -4.0132797 | -2.64656652 |
| 0.6530867 | 0.6192677 | 0.6869057 | b5 | -3.60980610 | -3.9094050 | -3.31020717 |
| A15_B15 | ||||||
| 0.2996588 | 0.2806140 | 0.3187035 | b1 | 7.49204312 | 6.7623355 | 8.22175071 |
| 0.2996588 | 0.2806140 | 0.3187035 | b2 | -5.70096594 | -6.3245982 | -5.07733373 |
| 0.2996588 | 0.2806140 | 0.3187035 | b3 | 3.24135698 | 2.7820672 | 3.70064674 |
| 0.2996588 | 0.2806140 | 0.3187035 | b4 | -8.93023517 | -9.6180731 | -8.24239727 |
| A16_B16 | ||||||
| 2.0821843 | 1.9491758 | 2.2151928 | b1 | -1.43998301 | -1.5033635 | -1.37660250 |
| A17_B17 | ||||||
| 2.4983673 | 2.3262411 | 2.6704934 | b1 | -1.55154531 | -1.6147999 | -1.48829074 |
| A18_B18 | ||||||
| 1.3671530 | 1.2852785 | 1.4490275 | b1 | -0.90548778 | -0.9627000 | -0.84827558 |
| A19_B19 | ||||||
| 1.8448311 | 1.7429463 | 1.9467160 | b1 | -0.17533493 | -0.2107730 | -0.13989687 |
| A20_B20 | ||||||
| 2.0506549 | 1.9358421 | 2.1654677 | b1 | 0.25977422 | 0.2258841 | 0.29366439 |
| A0_B7 | ||||||
| 1.1256167 | 1.0340423 | 1.2171911 | b1 | 0.58126267 | 0.5184027 | 0.64412266 |
| A0_B10 | ||||||
| 0.4346299 | 0.4061118 | 0.4631480 | b1 | 4.66604020 | 4.1482123 | 5.18386809 |
| 0.4346299 | 0.4061118 | 0.4631480 | b2 | -4.46992913 | -4.9466160 | -3.99324220 |
| 0.4346299 | 0.4061118 | 0.4631480 | b3 | 2.15641782 | 1.8421915 | 2.47064414 |
| 0.4346299 | 0.4061118 | 0.4631480 | b4 | -0.16165124 | -0.4860565 | 0.16275401 |
| 0.4346299 | 0.4061118 | 0.4631480 | b5 | 0.99733832 | 0.6554328 | 1.33924388 |
| 0.4346299 | 0.4061118 | 0.4631480 | b6 | -0.27429274 | -0.6151218 | 0.06653628 |
| 0.4346299 | 0.4061118 | 0.4631480 | b7 | 1.80937930 | 1.4514936 | 2.16726504 |
| 0.4346299 | 0.4061118 | 0.4631480 | b8 | -2.37031429 | -2.7532597 | -1.98736888 |
| A0_B11 | ||||||
| 0.4415561 | 0.4104003 | 0.4727118 | b1 | 2.19821926 | 1.7304127 | 2.66602585 |
| 0.4415561 | 0.4104003 | 0.4727118 | b2 | -4.72103582 | -5.1985394 | -4.24353222 |
| 0.4415561 | 0.4104003 | 0.4727118 | b3 | 8.08197275 | 7.0158913 | 9.14805423 |
| 0.4415561 | 0.4104003 | 0.4727118 | b4 | -3.02649470 | -4.0273930 | -2.02559636 |
| 0.4415561 | 0.4104003 | 0.4727118 | b5 | -7.79781171 | -8.5282818 | -7.06734158 |
These values are also saved to the file output/uoe_pre-vs-post_gpcm-results.csv.
tidy_irt_coefs %>%
write_csv("output/uoe_pre-vs-post_gpcm-results.csv")
theta <- seq(-6, 6, by=0.05)
info_matrix <- testinfo(irt_fit, theta, individual = TRUE)
colnames(info_matrix) <- test_versions %>% pull(label)
item_info_data <- info_matrix %>%
as_tibble() %>%
bind_cols(theta = theta) %>%
pivot_longer(cols = -theta, names_to = "item", values_to = "info_y") %>%
left_join(test_versions %>% select(item = label, MATH_group), by = "item") %>%
mutate(item = fct_reorder(item, parse_number(item)))
item_info_data %>%
group_by(theta) %>%
summarise(info_y = sum(info_y)) %>%
ggplot(aes(x = theta, y = info_y)) +
geom_line() +
labs(y = "Information") +
theme_minimal()
ggsave("output/uoe_pre-vs-post_info.pdf", width = 6, height = 4, units = "in")
This shows that the information given by the test is skewed toward the lower end of the ability scale - i.e. it can give more accurate estimates of students’ ability where their ability level is slightly below the mean.
Breaking this down by question helps to highlight those questions that are most/least informative:
item_info_data %>%
ggplot(aes(x = theta, y = info_y, colour = item)) +
geom_line() +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
facet_wrap(vars(item)) +
labs(y = "Information") +
theme_minimal()
We can also compute the sums of different subsets of the information curves – here, we will look at the questions based on their MATH group:
item_info_data %>%
group_by(theta) %>%
summarise(
items_all = sum(info_y),
items_A = sum(ifelse(MATH_group == "A", info_y, 0)),
items_B = sum(ifelse(MATH_group == "B", info_y, 0)),
items_C = sum(ifelse(MATH_group == "C", info_y, 0))
) %>%
pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>%
mutate(items = fct_relevel(items, "all", "A", "B", "C")) %>%
ggplot(aes(x = theta, y = info_y, colour = items)) +
geom_line() +
scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
labs(x = "Ability", y = "Information") +
theme_minimal()
ggsave("output/uoe_pre-vs-post_info-curves_A-vs-B.pdf", units = "cm", width = 14, height = 6)
This shows that the information in the MATH Group B questions is at a higher point on the ability scale than for the MATH Group A questions.
Since the number of items in each case is different, we consider instead the mean information per item:
item_info_data %>%
group_by(theta) %>%
summarise(
items_all = sum(info_y) / n(),
items_A = sum(ifelse(MATH_group == "A", info_y, 0)) / sum(ifelse(MATH_group == "A", 1, 0)),
items_B = sum(ifelse(MATH_group == "B", info_y, 0)) / sum(ifelse(MATH_group == "B", 1, 0)),
items_C = sum(ifelse(MATH_group == "C", info_y, 0)) / sum(ifelse(MATH_group == "C", 1, 0))
) %>%
pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>%
mutate(items = fct_relevel(items, "all", "A", "B", "C")) %>%
ggplot(aes(x = theta, y = info_y, colour = items)) +
geom_line() +
scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
labs(x = "Ability", y = "Mean information per item") +
theme_minimal()
ggsave("output/uoe_pre-vs-post_info-curves_A-vs-B-avg.pdf", units = "cm", width = 10, height = 6)
This shows that items of each MATH group are giving broadly similar levels of information on average, but at different points on the ability scale.
Using mirt’s areainfo function, we can find the total area under the information curves.
info_gpcm <- areainfo(irt_fit, c(-4,4))
info_gpcm %>% gt()
| LowerBound | UpperBound | Info | TotalInfo | Proportion | nitems |
|---|---|---|---|---|---|
| -4 | 4 | 42.02264 | 43.40405 | 0.9681734 | 23 |
This shows that the total information in all items is 43.4040491.
tidy_info <- test_versions %>%
mutate(item_num = row_number()) %>%
mutate(TotalInfo = purrr::map_dbl(
item_num,
~ areainfo(irt_fit,
c(-4, 4),
which.items = .x) %>% pull(TotalInfo)
))
tidy_info %>%
select(-item_num) %>%
arrange(-TotalInfo) %>%
#group_by(outcome) %>%
gt() %>%
fmt_number(columns = contains("a_"), decimals = 2) %>%
fmt_number(columns = contains("b_"), decimals = 2) %>%
data_color(
columns = contains("info"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = contains("outcome"),
colors = scales::col_factor(palette = c("viridis"), domain = NULL)
) %>%
cols_label(
TotalInfo = "Information"
)
| pre | post | MATH_group | description | label | outcome | Information |
|---|---|---|---|---|---|---|
| NA | B10 | B | 2x2 system possibilities | A0_B10 | added | 3.4740490 |
| A14 | B14 | B | find minimum gradient of cubic | A14_B14 | unchanged | 3.2649596 |
| A4 | B3 | A | completing the square | A4_B3 | unchanged | 3.0715660 |
| A17 | B17 | A | chain rule | A17_B17 | unchanged | 2.4983673 |
| NA | B11 | C | using max and min functions | A0_B11 | added | 2.2052047 |
| A11 | NA | A | summing arithmetic progression | A11_B0 | removed | 2.1106414 |
| A16 | B16 | A | trig chain rule | A16_B16 | unchanged | 2.0821843 |
| A20 | B20 | B | product rule with given values | A20_B20 | unchanged | 2.0506549 |
| A12 | B12 | A | find point with given slope | A12_B12 | unchanged | 2.0066633 |
| A13 | B13 | A | equation of tangent | A13_B13 | unchanged | 2.0050597 |
| A9 | B8 | A | simplify logs | A9_B8 | unchanged | 1.8753872 |
| A19 | B19 | B | area between curve and x-axis (in 2 parts) | A19_B19 | unchanged | 1.8448311 |
| A10 | B9 | B | identify graph of rational functions | A10_B9 | unchanged | 1.7952237 |
| A7 | B6 | B | graphical vector sum | A7_B6 | unchanged | 1.7786086 |
| A6 | B5 | A | trig wave function | A6_B5 | unchanged | 1.4660620 |
| A3 | B2 | B | composition of functions | A3_B2 | unchanged | 1.3962664 |
| A18 | B18 | A | definite integral | A18_B18 | unchanged | 1.3671471 |
| A5 | B4 | A | trig double angle formula | A5_B4 | unchanged | 1.3628319 |
| A1 | B1 | A | properties of fractions | A1_B1 | unchanged | 1.2979102 |
| A2 | NA | A | find intersection of lines | A2_B0 | removed | 1.2595188 |
| A15 | B15 | A | find and classify stationary points of cubic | A15_B15 | unchanged | 1.1948396 |
| NA | B7 | B | angle between 3d vectors (in context) | A0_B7 | added | 1.1255812 |
| A8 | NA | A | compute angle between 3d vectors | A8_B0 | removed | 0.8704912 |
Restricting instead to the range \(-2\leq\theta\leq2\):
tidy_info <- test_versions %>%
mutate(item_num = row_number()) %>%
mutate(TotalInfo = purrr::map_dbl(
item_num,
~ areainfo(irt_fit,
c(-2, 2),
which.items = .x) %>% pull(Info)
))
tidy_info %>%
select(-item_num) %>%
arrange(-TotalInfo) %>%
#group_by(outcome) %>%
gt() %>%
fmt_number(columns = contains("a_"), decimals = 2) %>%
fmt_number(columns = contains("b_"), decimals = 2) %>%
data_color(
columns = contains("info"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = contains("outcome"),
colors = scales::col_factor(palette = c("viridis"), domain = NULL)
) %>%
cols_label(
TotalInfo = "Information"
)
| pre | post | MATH_group | description | label | outcome | Information |
|---|---|---|---|---|---|---|
| NA | B10 | B | 2x2 system possibilities | A0_B10 | added | 3.0175812 |
| A14 | B14 | B | find minimum gradient of cubic | A14_B14 | unchanged | 2.9111803 |
| A4 | B3 | A | completing the square | A4_B3 | unchanged | 2.1982949 |
| A20 | B20 | B | product rule with given values | A20_B20 | unchanged | 1.9746900 |
| A17 | B17 | A | chain rule | A17_B17 | unchanged | 1.8835782 |
| A19 | B19 | B | area between curve and x-axis (in 2 parts) | A19_B19 | unchanged | 1.7505150 |
| NA | B11 | C | using max and min functions | A0_B11 | added | 1.7145125 |
| A13 | B13 | A | equation of tangent | A13_B13 | unchanged | 1.6270792 |
| A12 | B12 | A | find point with given slope | A12_B12 | unchanged | 1.6091576 |
| A16 | B16 | A | trig chain rule | A16_B16 | unchanged | 1.5859112 |
| A7 | B6 | B | graphical vector sum | A7_B6 | unchanged | 1.5205341 |
| A9 | B8 | A | simplify logs | A9_B8 | unchanged | 1.4153751 |
| A10 | B9 | B | identify graph of rational functions | A10_B9 | unchanged | 1.3740964 |
| A11 | NA | A | summing arithmetic progression | A11_B0 | removed | 1.3213868 |
| A3 | B2 | B | composition of functions | A3_B2 | unchanged | 1.1454766 |
| A18 | B18 | A | definite integral | A18_B18 | unchanged | 1.0917384 |
| A5 | B4 | A | trig double angle formula | A5_B4 | unchanged | 0.9525037 |
| A6 | B5 | A | trig wave function | A6_B5 | unchanged | 0.9523906 |
| NA | B7 | B | angle between 3d vectors (in context) | A0_B7 | added | 0.8776552 |
| A15 | B15 | A | find and classify stationary points of cubic | A15_B15 | unchanged | 0.8009079 |
| A2 | NA | A | find intersection of lines | A2_B0 | removed | 0.6104580 |
| A8 | NA | A | compute angle between 3d vectors | A8_B0 | removed | 0.5797480 |
| A1 | B1 | A | properties of fractions | A1_B1 | unchanged | 0.5687239 |
item_info_data %>%
mutate(item = as.character(item)) %>%
left_join(test_versions %>% mutate(item = as.character(label)), by = "item") %>%
group_by(theta) %>%
summarise(
items_pre = sum(ifelse(outcome %in% c("unchanged", "removed"), info_y, 0)),
items_post = sum(ifelse(outcome %in% c("unchanged", "added"), info_y, 0)),
items_added = sum(ifelse(outcome %in% c("added"), info_y, 0)),
items_removed = sum(ifelse(outcome %in% c("removed"), info_y, 0))
) %>%
pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>%
mutate(panel = ifelse(items %in% c("pre", "post"), "overall", "changes") %>% fct_rev()) %>%
mutate(items = fct_relevel(items, "removed", "added", "pre", "post")) %>%
ggplot(aes(x = theta, y = info_y, colour = items)) +
geom_line() +
scale_colour_brewer(palette = "Paired") +
facet_wrap(vars(panel), scales = "free_y") +
labs(x = "Ability", y = "Information") +
theme_minimal()
ggsave("output/uoe_pre-vs-post_info-curves_A-vs-B.pdf", units = "cm", width = 14, height = 6)
The changes have increased the total information in the test:
info_old <- areainfo(irt_fit,
c(-4, 4),
which.items = test_versions %>% filter(outcome %in% c("unchanged", "removed")) %>% pull(item_num))
info_new <- areainfo(irt_fit,
c(-4, 4),
which.items = test_versions %>% filter(outcome %in% c("unchanged", "added")) %>% pull(item_num))
versions_info <- bind_rows("Version A" = info_old,
"Version B" = info_new,
.id = "version") %>%
select(-LowerBound, -UpperBound, -Info, -Proportion) %>%
mutate(mean = TotalInfo / nitems)
versions_info %>% gt() %>%
cols_label(
version = "Test version",
TotalInfo = "Total information",
nitems = "Number of items",
mean = "Mean information per item"
)
| Test version | Total information | Number of items | Mean information per item |
|---|---|---|---|
| Version A | 36.59921 | 20 | 1.829961 |
| Version B | 39.16340 | 20 | 1.958170 |
Since the gpcm model is more complicated, there is a characteristic curve for each possible score on the question:
trace_data <- probtrace(irt_fit, theta) %>%
as_tibble() %>%
bind_cols(theta = theta) %>%
pivot_longer(cols = -theta, names_to = "level", values_to = "y") %>%
left_join(mark_levels %>% select(item, level = levelname, score), by = "level") %>%
mutate(score = as.factor(score))
trace_data %>%
mutate(item = fct_reorder(item, parse_number(item))) %>%
ggplot(aes(x = theta, y = y, colour = score)) +
geom_line() +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
facet_wrap(vars(item)) +
labs(y = "Probability of response") +
theme_minimal()
To get a simplified picture for each question, we compute the expected score at each ability level:
expected_scores <- trace_data %>%
mutate(item = fct_reorder(item, parse_number(item))) %>%
group_by(item, theta) %>%
summarise(expected_score = sum(as.double(as.character(score)) * y), .groups = "drop")
expected_scores %>%
ggplot(aes(x = theta, y = expected_score, colour = item)) +
geom_line() +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
facet_wrap(vars(item)) +
labs(y = "Expected score") +
theme_minimal()
The resulting curves look quite similar to those from the 2PL, allowing for some similar interpretation. For instance, superimposing all the curves shows that there is a spread of difficulties (i.e. thetas where the expected score is 2.5/5) and that some questions are more discriminating than others (i.e. steeper slopes):
plt <- expected_scores %>%
ggplot(aes(x = theta, y = expected_score, colour = item, text = item)) +
geom_line() +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
labs(y = "Expected score") +
theme_minimal()
ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/uoe_pre-vs-post_iccs-superimposed.pdf", width = 20, height = 14, units = "cm")
total_expected_score <- expected_scores %>%
group_by(theta) %>%
summarise(expected_score = sum(expected_score))
total_expected_score %>%
ggplot(aes(x = theta, y = expected_score)) +
geom_line() +
# geom_point(data = total_expected_score %>% filter(theta == 0)) +
# ggrepel::geom_label_repel(data = total_expected_score %>% filter(theta == 0), aes(label = round(expected_score, 1)), box.padding = 0.5) +
scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
labs(y = "Expected score") +
theme_minimal()
Here we redo the factor analysis, but using only the data from the new version of the test.
item_scores_B <- test_scores %>%
select(-F1) %>%
select(-contains("B0")) %>%
filter(test_version == "post") %>%
# select only the columns with question scores (names like Ax_Bx)
select(matches("A\\d+_B\\d+"))
The parameters package provides functions that run various checks to see if the data is suitable for factor analysis, and if so, how many factors should be retained.
structure <- check_factorstructure(item_scores_B)
n <- n_factors(item_scores_B)
The choice of 1 dimensions is supported by 8 (34.78%) methods out of 23 (t, p, Acceleration factor, R2, VSS complexity 1, Velicer’s MAP, TLI, RMSEA).
plot(n)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
summary(n) %>% gt()
| n_Factors | n_Methods |
|---|---|
| 1 | 8 |
| 2 | 5 |
| 3 | 1 |
| 4 | 3 |
| 5 | 1 |
| 13 | 1 |
| 17 | 1 |
| 18 | 3 |
#n %>% tibble() %>% gt()
The scree plot shows the eignvalues associated with each factor:
fa.parallel(item_scores_B, fa = "fa")
## Parallel analysis suggests that the number of factors = 6 and the number of components = NA
Based on this, there is clear support for a 1-factor solution. We also consider the 2-factor solution.
Show factanal output
fitfact <- factanal(item_scores_B, factors = 1, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
##
## Call:
## factanal(x = item_scores_B, factors = 1, rotation = "varimax")
##
## Uniquenesses:
## A1_B1 A3_B2 A4_B3 A5_B4 A6_B5 A7_B6 A9_B8 A10_B9 A12_B12 A13_B13
## 0.89 0.76 0.68 0.80 0.86 0.72 0.75 0.75 0.72 0.69
## A14_B14 A15_B15 A16_B16 A17_B17 A18_B18 A19_B19 A20_B20 A0_B7 A0_B10 A0_B11
## 0.59 0.86 0.77 0.74 0.80 0.68 0.71 0.85 0.65 0.69
##
## Loadings:
## [1] 0.57 0.53 0.50 0.50 0.52 0.56 0.64 0.51 0.57 0.54 0.59 0.56 0.33 0.49 0.45
## [16] 0.37 0.38 0.48 0.45 0.39
##
## Factor1
## SS loadings 5.06
## Proportion Var 0.25
##
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 2339.14 on 170 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)
ggplot(load, aes(x = fl1, y = 0)) +
geom_point() +
geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
labs(x = "Factor 1", y = NULL,
title = "Standardised Loadings",
subtitle = "Based upon correlation matrix") +
theme_minimal()
load %>%
select(question = variable, factor_loading = fl1) %>%
left_join(test_versions %>% select(question = label, description, MATH_group), by = "question") %>%
arrange(-factor_loading) %>%
gt() %>%
data_color(
columns = contains("factor"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = contains("MATH"),
colors = MATH_colours
)
| question | factor_loading | description | MATH_group |
|---|---|---|---|
| A14_B14 | 0.6369079 | find minimum gradient of cubic | B |
| A0_B10 | 0.5940682 | 2x2 system possibilities | B |
| A4_B3 | 0.5678270 | completing the square | A |
| A19_B19 | 0.5661081 | area between curve and x-axis (in 2 parts) | B |
| A0_B11 | 0.5579319 | using max and min functions | C |
| A13_B13 | 0.5575991 | equation of tangent | A |
| A20_B20 | 0.5416987 | product rule with given values | B |
| A7_B6 | 0.5315657 | graphical vector sum | B |
| A12_B12 | 0.5244957 | find point with given slope | A |
| A17_B17 | 0.5079599 | chain rule | A |
| A9_B8 | 0.5034418 | simplify logs | A |
| A10_B9 | 0.5004090 | identify graph of rational functions | B |
| A3_B2 | 0.4857094 | composition of functions | B |
| A16_B16 | 0.4845865 | trig chain rule | A |
| A5_B4 | 0.4508873 | trig double angle formula | A |
| A18_B18 | 0.4493676 | definite integral | A |
| A0_B7 | 0.3866100 | angle between 3d vectors (in context) | B |
| A15_B15 | 0.3777250 | find and classify stationary points of cubic | A |
| A6_B5 | 0.3748862 | trig wave function | A |
| A1_B1 | 0.3318861 | properties of fractions | A |
The questions that load most strongly on this factor are again dominated by the MATH Group B questions.
The new “in context” question, A0_B7, is disappointingly low on the factor loading (and on information, shown above). Perhaps the context is sufficiently routine for these students.
Here we also investigate the 4-factor solution, to see whether these factors are interpretable.
Show factanal output
fitfact2 <- factanal(item_scores_B, factors = 2, rotation = "varimax")
print(fitfact2, digits = 2, cutoff = 0.3, sort = TRUE)
##
## Call:
## factanal(x = item_scores_B, factors = 2, rotation = "varimax")
##
## Uniquenesses:
## A1_B1 A3_B2 A4_B3 A5_B4 A6_B5 A7_B6 A9_B8 A10_B9 A12_B12 A13_B13
## 0.89 0.75 0.68 0.80 0.85 0.71 0.74 0.75 0.72 0.69
## A14_B14 A15_B15 A16_B16 A17_B17 A18_B18 A19_B19 A20_B20 A0_B7 A0_B10 A0_B11
## 0.58 0.84 0.57 0.45 0.77 0.68 0.68 0.82 0.56 0.69
##
## Loadings:
## Factor1 Factor2
## A4_B3 0.50
## A14_B14 0.59
## A20_B20 0.53
## A0_B10 0.64
## A16_B16 0.63
## A17_B17 0.73
## A1_B1
## A3_B2 0.47
## A5_B4 0.36
## A6_B5 0.36
## A7_B6 0.50
## A9_B8 0.35 0.37
## A10_B9 0.40
## A12_B12 0.37 0.37
## A13_B13 0.44 0.34
## A15_B15 0.33
## A18_B18 0.41
## A19_B19 0.49
## A0_B7 0.41
## A0_B11 0.47
##
## Factor1 Factor2
## SS loadings 3.50 2.27
## Proportion Var 0.18 0.11
## Cumulative Var 0.18 0.29
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 934.49 on 151 degrees of freedom.
## The p-value is 3.79e-113
load2 <- tidy(fitfact2)
load2_plot <- load2 %>%
rename(question = variable) %>%
left_join(test_versions %>% rename(question = label), by = "question") %>%
ggplot(aes(x = fl1, y = fl2, colour = MATH_group, shape = MATH_group)) +
geom_point() +
geom_text_repel(aes(label = question), show.legend = FALSE, alpha = 0.6) +
labs(
x = "Factor 1 (of 2)",
y = "Factor 2 (of 2)"
) +
scale_colour_manual("MATH group", values = MATH_colours) +
scale_shape_manual(name = "MATH group", values = c(19, 17, 18)) +
theme_minimal()
load2_plot +
labs(
title = "Standardised Loadings",
subtitle = "Showing the 2-factor model"
)
ggsave("output/uoe_pre-vs-post_2factor.pdf", units = "cm", width = 12, height = 8, dpi = 300,
plot = load2_plot)
main_factors <- load2 %>%
# mutate(factorNone = 0.4) %>% # add this to set the main factor to "None" where all loadings are below 0.4
pivot_longer(names_to = "factor",
cols = contains("fl")) %>%
mutate(value_abs = abs(value)) %>%
group_by(variable) %>%
top_n(1, value_abs) %>%
ungroup() %>%
transmute(main_factor = factor, variable)
load2 %>%
select(-uniqueness) %>%
# add the info about which is the main factor
left_join(main_factors, by = "variable") %>%
left_join(test_versions %>% select(variable = label, description, MATH_group), by = "variable") %>%
arrange(main_factor) %>%
select(main_factor, everything()) %>%
# arrange adjectives by descending loading on main factor
rowwise() %>%
mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>%
group_by(main_factor) %>%
arrange(-max_loading, .by_group = TRUE) %>%
select(-max_loading) %>%
# sort out the presentation
rename("Main Factor" = main_factor, # the _ throws a latex error
"Question" = variable) %>%
mutate_at(
vars(starts_with("fl")),
~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
) %>%
kable(booktabs = T, escape = F, longtable = T) %>%
kableExtra::collapse_rows(columns = 1, valign = "top") %>%
kableExtra::kable_styling(latex_options = c("repeat_header"))
| Main Factor | Question | fl1 | fl2 | description | MATH_group |
|---|---|---|---|---|---|
| fl1 | A0_B10 | 0.644 | 0.143 | 2x2 system possibilities | B |
| A14_B14 | 0.586 | 0.277 | find minimum gradient of cubic | B | |
| A20_B20 | 0.534 | 0.195 | product rule with given values | B | |
| A4_B3 | 0.502 | 0.27 | completing the square | A | |
| A7_B6 | 0.496 | 0.218 | graphical vector sum | B | |
| A19_B19 | 0.491 | 0.284 | area between curve and x-axis (in 2 parts) | B | |
| A0_B11 | 0.475 | 0.294 | using max and min functions | C | |
| A3_B2 | 0.469 | 0.181 | composition of functions | B | |
| A13_B13 | 0.436 | 0.339 | equation of tangent | A | |
| A0_B7 | 0.406 | 0.1 | angle between 3d vectors (in context) | B | |
| A10_B9 | 0.403 | 0.293 | identify graph of rational functions | B | |
| A5_B4 | 0.362 | 0.262 | trig double angle formula | A | |
| A6_B5 | 0.359 | 0.141 | trig wave function | A | |
| A1_B1 | 0.28 | 0.176 | properties of fractions | A | |
| fl2 | A17_B17 | 0.134 | 0.728 | chain rule | A |
| A16_B16 | 0.162 | 0.633 | trig chain rule | A | |
| A18_B18 | 0.259 | 0.409 | definite integral | A | |
| A12_B12 | 0.372 | 0.374 | find point with given slope | A | |
| A9_B8 | 0.347 | 0.374 | simplify logs | A | |
| A15_B15 | 0.228 | 0.328 | find and classify stationary points of cubic | A |
TODO interpretation
This report supports the analysis in the following paper:
[citation needed]
In this analysis we used the following packages. You can learn more about each one by clicking on the links below.